Discovering Regularities in Databases Using Canonical Decomposition of Binary Relations
نویسندگان
چکیده
Regularities in databases are directly useful for knowledge discovery and data summarization. As a mathematical background, relational algebra helped for discovering the main data structures and existing dependencies between the different attributes in a relational database. Functional, difunctional and other kinds of dependencies in a relational database describe invariant regular structures that have been used intensively for database decomposition, and for minimizing redundancy. In this paper, we explain why “concepts” or “maximal rectangles” should be considered as the atomic regular structure for decomposing a binary relation which can be useful for different applications. More specifically, we have noticed experimentally, that “optimal concepts” contain pertinent information about data that we have exploited positively for machine learning, dynamic and incremental database organization, text summarization, data reduction, and even for modeling human thinking. Operators on concepts need to be developed because of their general usefulness in data and information engineering. In this paper, we propose to work on a canonical decomposition of binary relations based on two operators f and g, to model some important open problems, as for example on how to put in equation the best optimal conceptual coverage of a binary relation. We first develop an algorithm to find a conceptual coverage of a binary relation. We then exploit Riguet’s difunctional relation to put in equation all isolated pairs in a binary relation. Using iteratively these isolated pairs, we find several varieties of efficient solutions for the canonical decomposition problem.
منابع مشابه
Discovering Test Set Regularities in Relational Domains
Machine learning typically involves discovering regularities in a training set, then applying these learned regularities to classify objects in a test set. In this paper we present an approach to discovering additional regularities in the test set, and show that in relational domains such test set regularities can be used to improve classification accuracy beyond that achieved using the trainin...
متن کاملEfficient Discovery of Functional Dependencies and Armstrong Relations
In this paper, we propose a new efficient algorithm called Dep-Miner for discovering minimal non-trivial functional dependencies from large databases. Based on theoretical foundations, our approach combines the discovery of functional dependencies along with the construction of real-world Armstrong relations (without additional execution time). These relations are small Armstrong relations taki...
متن کاملAnalysing Binary Associations
This paper describes how binary associations in databases of items can be organised and clustered. Two similarity measures are presented that can be used to generate a weighted graph of associations. Each measure focuses on different kinds of regularities in the database. By calculating a Minimum Spanning Tree on the graph of associations, the most significant associations can be discovered and...
متن کاملSeparation-Based Adsorption of H2 from Binary Mixtures inside Single, Double, Triple Walled Boron-Nitride Nanotubes: A Grand- Canonical Monte-Carlo Study
This study investigates the separation based on adsorption of the binary gas mixture of hydrogen withbiogas (gases: CO2, CH4, O2, N2) and inert gases (gases: He, Ne, and Ar) using single-walled ((7,7), (15,15),(29,29), (44,44), (58,58) and (73,73) SWBNNTs), double-walled ((11,11)@(15,15), (7,7)@(22,22) DWBNNTs)and triple walled ((8,8)@(11,11)@(15,15) and (7,7)@(15,15)@(22,22) ...
متن کاملUsing difunctional relations in information organization
An algorithm for information organization based on rectangular decomposition of a binary relation is introduced, Rectangular decomposition allows a classification of databases presented as a binary relation. This problem, being NP-Complete problem, has been the subject of several previous works. However, we found out necessary the proposition of an approximate polynomial algorithm and to give a...
متن کامل